527 research outputs found

    The weight of phonetic substance in the structure of sound inventories

    Get PDF
    In the research field initiated by Lindblom & Liljencrants in 1972, we illustrate the possibility of giving substance to phonology, predicting the structure of phonological systems with nonphonological principles, be they listener-oriented (perceptual contrast and stability) or speaker-oriented (articulatory contrast and economy). We proposed for vowel systems the Dispersion-Focalisation Theory (Schwartz et al., 1997b). With the DFT, we can predict vowel systems using two competing perceptual constraints weighted with two parameters, respectively λ and α. The first one aims at increasing auditory distances between vowel spectra (dispersion), the second one aims at increasing the perceptual salience of each spectrum through formant proximities (focalisation). We also introduced new variants based on research in physics - namely, phase space (λ,α) and polymorphism of a given phase, or superstructures in phonological organisations (Vallée et al., 1999) which allow us to generate 85.6% of 342 UPSID systems from 3- to 7-vowel qualities. No similar theory for consonants seems to exist yet. Therefore we present in detail a typology of consonants, and then suggest ways to explain plosive vs. fricative and voiceless vs. voiced consonants predominances by i) comparing them with language acquisition data at the babbling stage and looking at the capacity to acquire relatively different linguistic systems in relation with the main degrees of freedom of the articulators; ii) showing that the places “preferred” for each manner are at least partly conditioned by the morphological constraints that facilitate or complicate, make possible or impossible the needed articulatory gestures, e.g. the complexity of the articulatory control for voicing and the aerodynamics of fricatives. A rather strict coordination between the glottis and the oral constriction is needed to produce acceptable voiced fricatives (Mawass et al., 2000). We determine that the region where the combinations of Ag (glottal area) and Ac (constriction area) values results in a balance between the voice and noise components is indeed very narrow. We thus demonstrate that some of the main tendencies in the phonological vowel and consonant structures of the world’s languages can be explained partly by sensorimotor constraints, and argue that actually phonology can take part in a theory of Perception-for-Action-Control

    Data and simulations about audiovisual asynchrony and predictability in speech perception

    No full text
    International audienceSince a paper by Chandrasekaran et al. (2009), an increasing number of neuroscience papers capitalize on the assumption that visual speech would be typically 150 ms ahead of auditory speech. It happens that the estimation of audiovisual asynchrony by Chandrasekaran et al. is valid only in very specific cases, for isolated CV syllables or at the beginning of a speech utterance. We present simple audiovisual data on plosive-vowel syllables (pa, ta, ka, ba, da, ga, ma, na) showing that audiovisual synchrony is actually rather precise when syllables are chained in sequences, as they are typically in most parts of a natural speech utterance. Then we discuss on the way the natural coordination between sound and image (combining cases of lead and lag of the visual input) is reflected in the so-called temporal integration window for audiovisual speech perception (van Wassenhove et al., 2007). We conclude by a computational proposal about predictive coding in such sequences, showing that the visual input may actually provide and enhance predictions even if it is quite synchronous with the auditory input

    Perceptuo-motor biases in the perceptual organization of the height feature in French vowels

    No full text
    A paraître dans Acta AcusticaInternational audienceThis paper reports on the organization of the perceived vowel space in French. In a previous paper [28], we investigated the implementation of vocal height contrasts along the F1 dimension in French speakers. In this paper, we present results from perceptual identification tests performed by twelve participants who took part in the production experiment reported in the earlier paper. For each subject, stimuli presented in the identification test were synthesized in two different vowel spaces, corresponding to two different vocal tract lengths. The results showed that first, the perceived French vowels belonging to similar height degrees were aligned on stable F1 values, independent of place of articulation and roundedness, as was the case for produced vowels. Second, the produced F1 distances between height degrees correlated with the perceived F1 distances. This suggests that there is a link between perceptual and motor phonemic prototypes in the human brain. The results are discussed using the framework of the Perception for Action Control (PACT) theory, in which speech units are considered to be gestures shaped by perceptual processes

    Disentangling unisensory from fusion effects in the attentional modulation of McGurk effects: a Bayesian modeling study suggests that fusion is attention-dependent

    No full text
    International audienceThe McGurk effect has been shown to be modulated by attention. However, it remains unclear whether attentional effects are due to changes in unisensory processing or in the fusion mechanism. In this paper, we used published experimental data showing that distraction of visual attention weakens the McGurk effect, to fit either the Fuzzy Logical Model of Perception (FLMP) in which the fusion mechanism is fixed, or a variant of it in which the fusion mechanism could be varied depending on attention. The latter model was associated with a larger likelihood when assessed with a Bayesian Model Selection criterion. Our findings suggest that distraction of visual attention affects fusion by decreasing the weight of the visual input

    Binding and unbinding the auditory and visual streams in the McGurk effect

    No full text
    International audienceSubjects presented with coherent auditory and visual streams generally fuse them into a single per- cept. This results in enhanced intelligibility in noise, or in visual modification of the auditory per- cept in the McGurk effect. It is classically considered that processing is done independently in the auditory and visual systems before interaction occurs at a certain representational stage, resulting in an integrated percept. However, some behavioral and neurophysiological data suggest the existence of a two-stage process. A first stage would involve binding together the appropriate pieces of audio and video information before fusion per se in a second stage. Then it should be possible to design experiments leading to unbinding . It is shown here that if a given McGurk stimulus is preceded by an incoherent audiovisual context, the amount of McGurk effect is largely reduced. Various kinds of incoherent contexts (acoustic syllables dubbed on video sentences or phonetic or temporal modi- fications of the acoustic content of a regular sequence of audiovisual syllables) can significantly reduce the McGurk effect even when they are short (less than 4s). The data are interpreted in the framework of a two-stage "binding and fusion" model for audiovisual speech perception

    A Bayesian framework for speech motor control

    No full text
    International audienceThe remarkable capacity of the speech motor system to adapt to various speech conditions is due to an excess of degrees of freedom, which enables producing similar acoustical properties with different sets of control strategies. To explain how the Central Nervous System selects one of the possible strategies, a common approach, in line with optimal motor control theories, is to model speech motor planning as the solution of an optimality problem based on cost functions. Despite the success of this approach, one of its drawbacks is the intrinsic contradiction between the concept of optimality and the observed experimental intra-speaker token-to-token variability. The present paper proposes an alternative approach by formulating feedforward optimal control in a probabilistic Bayesian modeling framework. This is illustrated by controlling a biomechanical model of the vocal tract for speech production and by comparing it with an existing optimal control model (GEPPETO). The essential elements of this optimal control model are presented first. From them the Bayesian model is constructed in a progressive way. Performance of the Bayesian model is evaluated based on computer simulations and compared to the optimal control model. This approach is shown to be appropriate for solving the speech planning problem while accounting for variability in a principled way

    Modulating fusion in the McGurk effect by binding processes and contextual noise

    No full text
    International audienceIn a series of experiments we showed that the McGurk effect may be modulated by context: applying incoherent auditory and visual material before an audiovisual target made of an audio "ba" and a video "ga" significantly decreases the McGurk effect. We interpreted this as showing the existence of an audiovisual "binding" stage controlling the fusion process. Incoherence would produce "unbinding" and result in decreasing the weight of the visual input in the fusion process. In this study, we further explore this binding stage around two experiments. Firstly we test the "rebinding" process, by presenting a short period of either coherent material or silence after the incoherent "unbinding" context. We show that coherence provides "rebinding", resulting in a recovery of the McGurk effect. In contrary, silence provides no rebinding and hence "freezes" the unbinding process, resulting in no recovery of the McGurk effect. Capitalizing on this result, in a second experiment including an incoherent unbinding context followed by a coherent rebinding context before the target, we add noise all over the contextual period, though not in the McGurk target. It appears that noise uniformly increases the rate of McGurk responses compared to the silent condition. This suggests that contextual noise increases the weight of the visual input in fusion, even if there is no noise within the target stimulus where fusion is applied. We conclude on the role of audiovisual coherence and noise in the binding process, in the framework of audiovisual speech scene analysis and the cocktail party effect

    Sensory-motor interactions in speech perception, production and imitation: behavioral evidence from close shadowing, perceptuo-motor phonemic organization and imitative changes.

    No full text
    International audienceSpeech communication can be viewed as an interactive process involving a functional coupling between sensory and motor systems. In the present study, we combined three classical experimental paradigms to further test perceptuomotor interactions in both speech perception and production. In a first close shadowing experiment, auditory and audiovisual syllable identification led to faster oral than manual responses. In a second experiment, participants were asked to produce and to listen to French vowels, varying from height feature, in order to test perceptuo-motor phonemic organization and idiosyncrasies. In a third experiment, online imitative changes on the fundamental frequency in relation to acoustic vowel targets were observed in a non-interactive situation of communication during both unintentional and voluntary imitative production tasks. Altogether our results appear exquisitely in line with a functional coupling between action and perception speech systems and provide further evidence for a sensory-motor nature of speech representations

    Sensory-motor interactions in speech perception, production and imitation: behavioral evidence from close shadowing, perceptuo-motor phonemic organization and imitative changes.

    No full text
    International audienceSpeech communication can be viewed as an interactive process involving a functional coupling between sensory and motor systems. In the present study, we combined three classical experimental paradigms to further test perceptuomotor interactions in both speech perception and production. In a first close shadowing experiment, auditory and audiovisual syllable identification led to faster oral than manual responses. In a second experiment, participants were asked to produce and to listen to French vowels, varying from height feature, in order to test perceptuo-motor phonemic organization and idiosyncrasies. In a third experiment, online imitative changes on the fundamental frequency in relation to acoustic vowel targets were observed in a non-interactive situation of communication during both unintentional and voluntary imitative production tasks. Altogether our results appear exquisitely in line with a functional coupling between action and perception speech systems and provide further evidence for a sensory-motor nature of speech representations

    Effect of context, rebinding and noise, on audiovisual speech fusion

    No full text
    International audienceIn a previous set of experiments we showed that audio-visual fusion during the McGurk effect may be modulated by context. A short context (2 to 4 syllables) composed of incoherent auditory and visual material significantly decreases the McGurk effect. We interpreted this as showing the existence of an audiovisual "binding" stage controlling the fusion process, and we also showed the existence of a "rebinding" process when an incoherent material is followed by a short coherent material. In this work we evaluate the role of acoustic noise superimposed to the context and to the rebinding material. We use either a coherent or incoherent context, followed, if incoherent, by a variable amount of coherent "rebinding" material, with two conditions, either silent or with superimposed speech-shaped noise. The McGurk target is presented with no acoustic noise. We confirm the existence of unbinding (lower McGurk effect with incoherent context) and rebinding (the McGurk effect is recovered with coherent rebinding). Noise uniformly increases the rate of McGurk responses compared to the silent condition. We conclude on the role of audiovisual coherence and noise in the binding process, in the framework of audiovisual speech scene analysis and the cocktail party effect
    • …
    corecore